Methods and systems for estimating usage of a simultaneous multi-threading processor

ABSTRACT

Methods and systems are disclosed for determining a CPU usage adjustment factor and for automatically applying the CPU usage adjustment factor to provide a CPU usage estimate for an SMT processor. In one implementation, the methods and systems obtain samples of CPU usage reported by the operating system at a predefined sampling rate over a predefined sampling interval. Thread states for the threads substantially corresponding to the reported CPU usage are so obtained at the predefined sampling rate and over the predefined sampling interval. This sampling may be performed for servers running different applications and having diverse processing loads. An estimate of the distribution of the number of threads running for the CPU usages reported may then be determined from the sampled data. A CPU usage adjustment factor may then be derived, based on the distribution, and used to provide a CPU usage estimate.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.12/787,232 filed on May 25, 2010, now U.S. Pat. No. 8,027,808 issued onSep. 27, 2011, which is a continuation of U.S. patent application Ser.No. 11/860,416 filed on Sep. 24, 2007, now U.S. Pat. No. 7,725,296issued on May 25, 2010. This application is related in subject matterto, and incorporates herein by reference in its entirety, each of thefollowing: U.S. patent application Ser. No. 11/860,412 filed on Sep. 24,2007, now U.S. Pat. No. 7,680,628 issued on Mar. 16, 2010, and isentitled “Estimating Processor Throughput”; and U.S. patent applicationSer. No. 11/860,419 filed on Sep. 24, 2007, now U.S. Pat. No. 7,720,643,issued on May 18, 2010, and is entitled “Estimating ProcessorThroughput”.

COPYRIGHT NOTICE

A portion of the disclosure of this patent document contains materialthat is subject to copyright protection. The copyright owner has noobjection to the facsimile reproduction by anyone of the patent documentor the patent disclosure as it appears in the Patent and TrademarkOffice patent file or records, but otherwise reserves all copyrightrights whatsoever.

TECHNICAL FIELD

The disclosed embodiments relate generally to estimates of processorthroughput and, more specifically, to systems and methods for adjustingsuch throughput estimates in SMT processors to account for distortionsin measured usage.

BACKGROUND

Simultaneous Multi-Threading, or SMT, is a processor design in which asingle CPU can execute multiple program threads in parallel. A “thread,”as understood by those having ordinary skill in the art, is a stream ofinstructions (e.g., add, divide, branch, etc.) that are executed by theprocessor. The Pentium 4 Processor with HT Technology from Intel Corp.of Santa Clara, Calif. is an example of an SMT processor, as is thePOWER5 processor from International Business Machines Corp. of Armonk,N.Y. For purposes of this description, the terms “processor” and “CPU”are used interchangeably.

In a typical SMT processor, one physical CPU can have two threadsrunning simultaneously. This is in contrast to a duo core processorwhere there are actually two discrete physical processors combined in asingle package, with each core being capable of executing one thread.Depending on the particular processor manufacturer, SMT processors canprovide significant improvements in processor throughput (e.g., 20%-30%according to Intel, 30%-40% according to IBM) over traditional or singlethreaded processors. The term “throughput” is used herein to mean therate at which a processor can execute instructions, typically expressedin instructions per second.

A drawback of SMT processors is that it causes certain softwareoperating systems to distort the level of CPU usage. For example,WINDOWS SERVER 2003 (e.g., server operating system) has been known toreport significantly distorted CPU utilization on Intel SMT processors.In many case, an SMT processor was reported by WINDOWS SERVER 2003(e.g., server operating system) as being only 50% busy when it wasactually running closer to 83% of its maximum throughput. Such adistortion of the processor's usage may result in a misperception thatthe processor can accept additional processing load, which may lead tosluggish or otherwise unacceptable system performance. The distortionmay also adversely impact the ability of capacity planners, for example,to accurately forecast the number of servers a company may require goingforward.

Accordingly, what is needed is a way to more accurately estimate CPUusage on SMT processors. More specifically, what is needed is a way todetermine a CPU usage adjustment factor for SMT processors and toautomatically apply the CPU usage adjustment factor to provide a moreaccurate CPU usage estimate for SMT processors.

SUMMARY

The disclosed embodiments are directed to methods and systems fordetermining a CPU usage adjustment factor and for automatically applyingthe CPU usage adjustment factor to provide a more accurate CPU usageestimate for an SMT processor running a thread-aware operating system.In one implementation, the methods and systems obtain samples of CPUusage as reported by the operating system at a predefined sampling rateand over a predefined sampling interval. Thread states for the threadssubstantially corresponding to the reported CPU usage are so obtained atthe predefined sampling rate and over the predefined sampling interval.This sampling may be performed for several servers running differentapplications and having diverse processing loads. An estimate of thedistribution of the number of threads running for the CPU usagesreported may then be determined from the sampled data. A CPU usageadjustment factor may then be derived based on the distribution that maybe used to provide a more accurate CPU usage estimate.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other advantages of the invention will become apparentfrom the following detailed description and upon reference to thedrawings, wherein:

FIG. 1 illustrates an exemplary infrastructure for compensating CPUusage estimates according to the disclosed embodiments;

FIG. 2 illustrates a graph of measured processor throughput versusreported processor throughput according to the disclosed embodiments;

FIG. 3 illustrates an exemplary system for compensating CPU usageestimates according to the disclosed embodiments;

FIG. 4 illustrates an exemplary processor usage tuner for compensatingCPU usage estimates according to the disclosed embodiments;

FIG. 5 illustrates an exemplary utility for determining an adjustmentfactor according to the disclosed embodiments;

FIG. 6 illustrates an exemplary method of determining an adjustmentfactor according to the disclosed embodiments; and

FIG. 7 illustrates an exemplary method of compensating CPU usageestimates according to the disclosed embodiments.

DETAILED DESCRIPTION

The drawings described above and the written description of specificstructures and functions below are not presented to limit the scope ofwhat has been invented or the scope of the appended claims. Rather, thedrawings and written description are provided to teach any personskilled in the art to make and use the inventions for which patentprotection is sought. Those skilled in the art will appreciate that notall features of a commercial embodiment of the inventions are describedor shown for the sake of clarity and understanding.

Persons of skill in this art will also appreciate that the developmentof an actual commercial embodiment incorporating aspects of theinventions will require numerous implementation-specific decisions toachieve the developer's ultimate goal for the commercial embodiment.Such implementation-specific decisions may include, and likely are notlimited to, compliance with system-related, business-related,government-related and other constraints, which may vary by specificimplementation, location and from time to time. While a developer'sefforts might be complex and time-consuming in an absolute sense, suchefforts would be, nevertheless, a routine undertaking for those of skillin this art having benefit of this disclosure.

It should be understood that the embodiments disclosed and taught hereinare susceptible to numerous and various modifications and alternativeforms. Thus, the use of a singular term, such as, but not limited to,“a” and the like, is not intended as limiting of the number of items.Also, the use of relational terms, such as, but not limited to, “top,”“bottom,” “left,” “right,” “upper,” “lower,” “down,” “up,” “side,” andthe like, are used in the written description for clarity in specificreference to the drawings and are not intended to limit the scope of theinvention or the appended claims.

Particular embodiments are now described with reference to blockdiagrams and/or operational illustrations of methods. It should beunderstood that each block of the block diagrams and/or operationalillustrations, and combinations of blocks in the block diagrams and/oroperational illustrations, may be implemented by analog and/or digitalhardware, and/or computer program instructions. Computer programsinstructions for use with or by the embodiments disclosed herein may bewritten in an object oriented programming language, conventionalprocedural programming language, or lower-level code, such as assemblylanguage and/or microcode. The program may be executed entirely on asingle processor and/or across multiple processors, as a stand-alonesoftware package or as part of another software package. Such computerprogram instructions may be provided to a processor of a general-purposecomputer, special-purpose computer, ASIC, and/or other programmable dataprocessing system.

The executed instructions may also create structures and functions forimplementing the actions specified in the mentioned block diagramsand/or operational illustrations. In some alternate implementations, thefunctions/actions/structures noted in the drawings may occur out of theorder noted in the block diagrams and/or operational illustrations. Forexample, two operations shown as occurring in succession, in fact, maybe executed substantially concurrently or the operations may be executedin the reverse order, depending on the functionality/acts/structureinvolved.

Referring now to FIG. 1, an exemplary infrastructure 100 is shown forproviding CPU usage estimates according to the disclosed embodiments.The exemplary infrastructure 100 is typical of infrastructures found inmany companies insofar as it has numerous computing units runningvarious types of applications. In the example shown here, theinfrastructure 100 may include a front-end server 102, various backendservers 104, and one or more databases 106, all interconnected as shown.The front-end server 102 may operate as a host for a Web site to which auser 108 may connect over a network 110 to access various productsand/or services. The backend servers 104 and databases 106 typicallyprovide the actual programming and processing that support the variousproducts and/or services offered by the Web site.

As a company grows and expands, more computers and servers may need tobe added to the infrastructure 100. Computers and servers, however, areenormously expensive in terms of equipment capital, software licensingfees, support and maintenance man-hours, and other costs. Thus, acompany must carefully consider and plan for a capacity expansion wellbefore proceeding with the expansion. The planning can be made moredifficult if historical trends on CPU usage are inaccurate, as may bethe case with SMT processors.

Because of their simultaneous thread execution, SMT processors usuallyappear to an operating system as two different processors, where eachcan run one thread. These processors are commonly referred to as logicalprocessors, whereas the actual processor itself is the physicalprocessor. A physical SMT processor with one logical processor running anormal thread and the other logical processor being idle is an unsharedprocessor. The throughput that an unshared SMT processor can deliver asa percentage of its maximum throughput is referred to herein as theUnshared SMT %. If both logical processor are running and the threadsare constructive threads (and not idle threads), then the SMT processoris sharing resources between the two threads. The incremental increasein throughput resulting from having both threads running is referred toherein as the SMT Benefit %. This SMT Benefit % may vary and has beenobserved to be about 20%-30% for Intel SMT processors and about 30%-40%for IBM SMT processors. Conversely, the performance decrease resultingfrom an unshared SMT processor is referred to herein as the SMT Cost %.This SMT Cost % has been reported to be negligible (e.g., less than 1%).Based on the foregoing definitions, the unshared throughput or UnsharedSMT % of one logical processor, with and without the SMT Cost %, may beexpressed as follows:Unshared SMT%=100%/(100%+SMT Benefit%)  (Eq. 1)Unshared SMT%=(100%−SMT Cost%)/(100%+SMT Benefit%)  (Eq.2)

Thus, assuming the SMT Benefit % is 20%, as has been reported by Intel,then according to Equation 1, the Unshared SMT % is 100/120, or 83%.This means that one logical processor running by itself may actually beconsuming about 83% of the physical SMT processor's throughput.Unfortunately, when only one logical processor is running, certainthread-aware operating systems, such as WINDOWS SERVER 2003 (e.g.,server operating system), report the physical SMT processor's throughputas 50%, reflecting the fact that only one of the two logical processorsis running. While a reported 50% CPU usage may mean that, on average,the number of running threads is equal to half the number of logicalprocessors, the actual number of running threads at any point in timemay vary. Thus, a reported 50% CPU usage may be accurate, or it may begrossly distorted, depending on the distribution of the number ofrunning threads.

The above distortion may vary with the number of physical SMT processorsin use. Table 1 below illustrates the distortion for a server with twophysical SMT processors (i.e., four logical processors). In Table 1, itis assumed that all five scenarios (or states) are equally possible. Ascan be seen, a significant difference exists between the averagethroughput and the reported CPU usage, with the average distortion forall five scenarios being approximately 13%.

TABLE 1 Distortion for a server with two physical SMT processors RunningPhysical CPU #1 Physical CPU #2 Average Reported Threads ThreadsThroughput Threads Throughput Throughput CPU Distortion 0 0  0% 0  0% 0%  0%  0% 1 1  83% 0  0%  41%  25% 16% 2 1  83% 1  83%  83%  50% 32% 32 100% 1  83%  91%  75% 16% 4 2 100% 2 100% 100% 100%  0% Averages(assuming flat distribution of number of threads)  63%  50% 13%

To correct the distortion, ideally, the distribution of the number ofrunning threads should be determined for the CPU usages reported.However, this distribution depends largely on the behavior of theapplications running on the processor and the reported level of CPUusage. Unfortunately, WINDOWS SERVER 2003 (e.g., server operatingsystem) and similar operating systems provide no easy way to obtain thisinformation and, as a result, others ways of obtaining the informationmust be used. One way to obtain the distribution information is todetermine the actual thread states of the threads corresponding to theCPU usage being reported for each logical processor. Such thread statedetermination may be performed, for example, using a process similar toThread State Counter provided with WINDOWS SERVER 2003 (e.g., serveroperating system). However, because of the sheer volume of data thatwould result (the processor executes millions of instructions persecond), it may be desirable to limit the number of data points, forexample, by only obtaining a selected few CPU usages and the threadstates corresponding thereto.

As an example, in one recent implementation, the CPU usage and threadstates were sampled once per second over a one-hour interval usingThread State Counter. The sampling was done for a variety of serversduring a time period when CPU usage was reported between 30% and 70%.The server applications were substantially diverse and included twointranet servers, four application servers, three SQL database servers,one fax server, two file servers, one exchange server, one securityserver, and one active directory server. The thread states were thencategorized into one of three groups: (1) overhead threads resultingfrom the use of the WINDOWS SERVER 2003 (e.g., server operating system)processes “Typeperf” or “Smlogsvc” to read the thread states; (2) idlethreads resulting from the System Idle process; and (3) all otherthreads. Only the third category of threads was retained, as the othertwo categories may not be legitimate threads.

From the data collected, a distribution function was derived to predictthe probability that exactly N threads are running given a particularlevel of reported CPU usage (CPU %), as shown in Equation 3. Using thisequation, a fairly accurate estimate of the actual distribution ofrunning threads at various levels of reported CPU usage was obtained bygrouping the reported CPU usage according to the nearest 10%.Σp(N,CPU%)=1  (Eq.3)

The processor throughput, or Throughput %, was then calculated accordingto Equation 4 below, where U % is a shorthand version of the UnsharedSMT % of Equation 1, N is the number of running threads, P is the numberof physical processors, and L is the number of logical processors.Throughput%=Σp(N,CPU%)*U%*N/P where0≦N≦P+Σp(N,CPU%)*(U%*(L−N)+100%*(N−P))/P where P<N≦L  (Eq.4)

In Equation 4, the first line represents periods during the measurementinterval when each thread was assigned its own physical processor, andthe processor was delivering between 0% and U % of its maximumthroughput. The second line represents periods when all physicalprocessors were busy, and some processors had to be shared, with the L-Nunshared processors delivering U % of their maximum throughput, and theN-P shared processors delivering their full maximum throughput. Dividingthe results by P converts the units from fractional processors tofractions of the whole server.

Applying Equation 4 to the data collected for the various servers aboveand plotting the results for each sever resulted in the graphs shown inFIG. 2. In FIG. 2, the horizontal axis represents the processorthroughput as reported by WINDOWS SERVER 2003 (e.g., server operatingsystem), and the vertical axis represents “measured” processorthroughput (normalized) insofar as the throughputs are based oncollected data. The topmost curve 200 and bottom most curve 202represent the theoretical maximum overstatement and maximumunderstatement of processor throughput, assuming a 20% SMT benefit.

As can be seen, while the largest absolute distortion between reportedCPU usage and actual throughput occurs at 50% CPU usage, the greatestrelative distortion occurs at very low CPU usage. For example, a serverthat is reported as having only 20% CPU usage is likely to be using 33%of its potential throughput because at such low CPU usage, physical CPUsare almost never shared. The distortion percentage can be obtained bydividing the 33% throughput by the 20% benefit, resulting in adistortion of 67%.

Because of the distortion, capacity planners must be careful inpredicting computer or server capacity, particularly at low reportedutilizations. The large relative distortion that exists at lowutilizations may lull an unwary capacity planner into thinking, forexample, that a reported 20% CPU usage means only using ⅕ of a server isbeing used, when in reality about ⅓ of it is actually being used. Thecapacity planner would need to account for the distortion before basingany capacity decisions and/or business decisions on the compensated CPUusage. Thus, for example, the capacity planner may need to adjustprojections of CPU usage that may be needed going forward if the currentCPU usage or the projected CPU usage involves SMT processors.

Note in FIG. 2 that, despite the varied and diverse natures of thefifteen servers and the applications running thereon, the individualcurves are relatively close to one another and have the same generalshape. Based on this closeness and similarity of shape, it is possibleto derive a single curve that could be used to represent the variouscurves of the present group of servers. Other representative curves maybe derived for other groups of servers, depending on the applicationsrunning on the servers. Such a representative curve, indicated by thethick line labelled 204, may be derived, for example, by taking anaverage of the individual curves. Various curve fitting techniques knownto those having ordinary skill in the art, such as regression analysis,may then be used to develop an equation for the representative curve204. Such an equation mathematically describes the representative curve204 and be used as an adjustment factor for anyone of the servers.Specifically, the equation/adjustment factor may produce a new CPU usagethat compensates for the distortion in the reported CPU usage providedby WINDOWS SERVER 2003 (e.g., server operating system) (and similarthread-aware operating systems).

In some embodiments, however, instead of taking an average of theindividual curves in FIG. 2, an equation may be derived to specificallydescribe each individual curve mathematically using, again, well-knowncurve fitting techniques. Because each curve represents one server, eachequation would be applicable only to that server. The server-specificequations may then be used to calculate a compensated CPU usage for eachserver. Of course, to ensure the reliability of the server-specificcompensations, care should be taken by those having ordinary skill inthe art to make sure the underlying data for each server is obtainedover a sufficiently long time and/or using a sufficiently large numberof samples to produce a statistically valid result.

In some embodiments, instead of using an equation as the adjustmentfactor to calculate a compensated CPU usage, a lookup table may becreated from the data. Such a lookup table may describe therepresentative curve 204 (or server-specific curves) numerically insteadof mathematically. The CPU usage may then be looked up in the lookuptable to determine the compensated value.

In accordance with the disclosed embodiments, each one of the front-endserver 102, backend server 104, and/or database 106 (FIG. 1) that usesan SMT processor may be configured to compensate for distortions inreported CPU usage. FIG. 3 illustrates an example of the front-endserver 102 being configured in this manner. Any suitable computer knownto those having ordinary skill in the art may be used as the front-endserver 102, including a personal computer, workstation, mainframe, andthe like. The backend server 104 and/or database 106 may have a similardesign and are therefore not described in detail here.

The front-end server 102 typically includes a bus 302 or othercommunication mechanism for communicating information and an SMTprocessor 304 coupled with the bus 302 for processing information. Thefront-end server 102 may also include a main memory 306, such as arandom access memory (RAM) or other dynamic storage device, coupled tothe bus 302 for storing computer-readable instructions to be executed bythe SMT processor 304. The main memory 306 may also be used for storingtemporary variables or other intermediate information during executionof the instructions to be executed by the SMT processor 304. Thefront-end server 102 may further include a read-only memory (ROM) 308 orother static storage device coupled to the bus 302 for storing staticinformation and instructions for the SMT processor 304. Acomputer-readable storage device 310, such as a magnetic, optical, orsolid state device, may be coupled to the bus 302 for storinginformation and instructions for the SMT processor 304.

The front-end server 102 may be coupled via the bus 302 to a display312, such as a cathode ray tube (CRT) or liquid crystal display (LCD),for displaying information to a user. An input device 314, including,for example, alphanumeric and other keys, may be coupled to the bus 302for communicating information and command selections to the SMTprocessor 304. Another type of user input device may be a cursor control316, such as a mouse, a trackball, or cursor direction keys forcommunicating direction information and command selections to the SMTprocessor 304, and for controlling cursor movement on the display 312.The cursor control 316 typically has two degrees of freedom in two axes,a first axis (e.g., X axis) and a second axis (e.g., Y axis), that allowthe device to specify positions in a plane.

The term “computer-readable instructions” as used above refers to anyinstructions that may be performed by the SMT processor 304 and/or othercomponents. Similarly, the term “computer-readable medium” refers to anystorage medium that may be used to store the computer-readableinstructions. Such a medium may take many forms, including, but notlimited to, non volatile media, volatile media, and transmission media.Non volatile media may include, for example, optical or magnetic disks,such as the storage device 310. Volatile media may include dynamicmemory, such as main memory 306. Transmission media may include coaxialcables, copper wire and fiber optics, including wires of the bus 302.Transmission media may also take the form of acoustic or light waves,such as those generated during radio frequency (RF) and infrared (IR)data communications. Common forms of computer-readable media mayinclude, for example, a floppy disk, a flexible disk, hard disk,magnetic tape, any other magnetic medium, a CD ROM, DVD, any otheroptical medium, punch cards, paper tape, any other physical medium withpatterns of holes, a RAM, a PROM, an EPROM, a FLASH EPROM, any othermemory chip or cartridge, a carrier wave, or any other medium from whicha computer can read.

Various forms of the computer-readable media may be involved in carryingone or more sequences of one or more instructions to the SMT processor304 for execution. For example, the instructions may initially be borneon a magnetic disk of a remote computer. The remote computer can loadthe instructions into its dynamic memory and send the instructions overa telephone line using a modem. A modem local to the front-end server102 can receive the data on the telephone line and use an infraredtransmitter to convert the data to an infrared signal. An infrareddetector coupled to the bus 302 can receive the data carried in theinfrared signal and place the data on the bus 302. The bus 302 carriesthe data to the main memory 306, from which the SMT processor 304retrieves and executes the instructions. The instructions received bythe main memory 306 may optionally be stored on the storage device 310either before or after execution by the SMT processor 304.

The front-end server 102 may also include a communication interface 318coupled to the bus 302. The communication interface 318 typicallyprovides a two way data communication coupling between the front-endserver 102 and the network 110. For example, the communication interface318 may be an integrated services digital network (ISDN) card or a modemused to provide a data communication connection to a corresponding typeof telephone line. As another example, the communication interface 318may be a local area network (LAN) card used to provide a datacommunication connection to a compatible LAN. Wireless links may also beimplemented. Regardless of the specific implementation, the mainfunction of the communication interface 318 is to send and receiveelectrical, electromagnetic, optical, or other signals that carrydigital data streams representing various types of information.

In accordance with the disclosed embodiments, a processor usage tuner320, or more accurately, the computer-readable instructions therefore,may reside on the storage device 310. Additionally, an adjustment factorutility 322 may also reside on the storage device 310 in someembodiments. The processor usage tuner 320 may then be executed tocompensate for any distortion that may be present in the CPU usagereported by WINDOWS SERVER 2003 (e.g., server operating system) (andsimilar operating systems) for the front-end server 102. Thecompensation may be based on a generic adjustment factor applicable tomultiple servers, or it may be based on a particular adjustment factorthat is specific to the front-end server 102. Similarly, the adjustmentfactor utility 322 may be executed from time to time or as needed togenerate an adjustment factor for the front-end server 102. Theadjustment factor may then be used by itself as a server-specificadjustment factor, or it may be combined with results from otheradjustment factor utility 322 to derive a generic adjustment factor. Thegeneric adjustment factor may then be inputted to the processor usagetuner 320 and used as needed for distortion compensation.

It should be noted that, in some embodiments, instead of residing on thestorage device 310 of the front-end server 102, either the processorusage tuner 320 or adjustment factor utility 322, or both, may insteadbe run from a central location on the network 110. In terms ofprogramming language, both the processor usage tuner 320 and adjustmentfactor utility 322 may also be implemented using any suitableprogramming language known to those having ordinary skill in the art,including Java, C++, Visual Basic, and the like.

Referring now to FIG. 4, in one embodiment, the processor usage tuner320 may comprise a number of functional components, including a userinterface module 400, a processor usage acquisition module 402, aprocessor usage correction module 404, and a data compilation module406. Other functional components may also be added to or removed fromthe processor usage tuner 320 without departing from the scope of thedisclosed embodiments. Note that although the various functionalcomponents 400-406 of the processor usage tuner 320 have been shown asdiscrete units in FIG. 4, those having ordinary skill in the art willunderstand that two or more of these components may be combined into asingle component, and that any individual component may be divided intoseveral constituent components, without departing from the scope of thedisclosed embodiments.

In general, the user interface module 400 is responsible for allowingthe user to interact with the various functional components of theprocessor usage tuner 320 as needed. To this end, the user interfacemodule 400 may provide a graphical user interface for receiving inputfrom the user. Such input may include, for example, an adjustment factorthat the processor usage tuner 320 may use to compensate for distortionsin the estimated CPU usage reported by thread-aware operating systemslike WINDOWS SERVER 2003, (e.g., server operating system). Moreprecisely, the input may be an equation or a lookup table that may thenbe used as the adjustment factor to compensate for distortions in theCPU usage estimate. The equation or lookup table may be a standard oneapplicable to multiple servers, or it may be a server-specificadjustment factor applicable to a particular server only. In someembodiments, instead of manual user input, it is also possible for theequation or look up table to be provided to the processor usage tunermodule 320 automatically, for example, from a predefined databasecontaining generic and/or server-specific equations and/or lookup tablesfor various servers. In any event, in addition to receiving inputs, thegraphical user interface may also be used to present the results of thecompensated CPU usage to the user.

The processor usage acquisition module 402 may function to acquire theCPU usage estimates made by thread-aware operating systems like WINDOWSSERVER 2003 (e.g., server operating system). In one implementation, theprocessor usage acquisition module 402 may be configured to invoke orotherwise call an appropriate process of the operating system, such as“Perfmon,” “Typeperf,” and other performance monitoring processes. Theprocessor usage acquisition module 402 may then obtain an estimate forthe CPU usage from the operating system process invoked. It is alsopossible for the processor usage acquisition module 402 to invoke theoperating system's thread state counter to determine the thread state ofeach processor (i.e., whether a constructive thread is running on theprocessor). The processor usage acquisition module 402 may thereafteruse the thread state to conduct its own estimate of CPU usage in amanner similar to that employed by the operating system to obtain anestimate of the CPU usage.

The estimate of CPU usage may then be provided to the processor usagecorrection module 404 for compensation of any distortion in theestimates provided by the operating system. In one embodiment, theprocessor usage correction module 404 may perform this compensation byapplying the equation provided via the user interface module 400 (orobtained automatically from a repository) to the CPU usage estimate. Theprocessor usage correction module 404 may also perform the compensationby looking up the compensated estimate in a lookup table provided viathe user interface module 400 (or obtained automatically from arepository). In either case, processor usage correction module 404 maythereafter provide the compensated estimate to the user interface module400 for presentation to a user as needed or requested.

Some or all of the compensated CPU usage estimates may also be collectedand stored by the data compilation module 406. The data compilationmodule 406 may function to store the compensated CPU usage estimatesalong with various information therefor, such as time and date, serverand/or processor name, and the like, in a usage data repository (notexpressly shown). This compilation function may be performed for acertain time interval of interests, such as during periods ofparticularly heavy processing load, or it may be performed on an ongoingbasis. The compiled data may then be analyzed for historical trends andusage patterns for the front-end server 102. Similar data may becompiled for other front-end servers 102, backend servers 104, and/ordatabases 106 in order to construct a more comprehensive picture of CPUusage. Such data may then be used to support capacity planningdecisions, equipment cost allocations, and the like.

As for the adjustment factor, the adjustment factor utility 322 may beconfigured to determine this adjustment factor, as illustrated in FIG.5. Like the processor usage tuner 320, the adjustment factor utility 322may comprise a number of functional components, including a userinterface module 500, a thread state acquisition module 502, the dataprocessing module 504, and a curve fitting module 506. And as with theprocessor usage tuner 320, other functional components may be added toor removed from the adjustment factor utility 322 without departing fromthe scope of the disclosed embodiments. Note that although theadjustment factor utility 322 and the processor usage tuner 320 are bothshown on the front-end server 102 (FIG. 3), one may certainly be presenton the front-end server 102 without the other, and vice versa, withoutdeparting from the scope of the disclosed embodiments.

In general, the user interface module 500 is responsible for allowingthe user to interact with the various functional components of theadjustment factor utility 322 as needed. To this end, the user interfacemodule 500 may provide a graphical user interface for receiving inputfrom the user. Such input may include, for example, the frequency withwhich to sample thread states, the overall duration in which to obtainsamples, and the like. Any suitable style of graphical user interfacemay be used, as the particular design, layout, color scheme, and soforth, are not overly important to the practice of the disclosedembodiments. In addition to receiving inputs, the graphical userinterface of the user interface module 500 may also present the resultsof the adjustment factor determination to the user in some embodiments.

The thread state acquisition module 502 may function to acquire thestates of the various threads (e.g., whether they are idle or being runby the logical processors) along with the level of CPU usage reported bythe operating system. Recall from the discussion with respect to thegraph in FIG. 2 above that the CPU usage reported by the operatingsystem may be distorted, and that the distribution of the number ofthreads running on the logical processors may be used to determine anadjustment factor. In one implementation, the thread state acquisitionmodule 502 may invoke or otherwise call a thread state counter, such asthe one provided with WINDOWS SERVER 2003 (e.g., server operatingsystem). The thread state acquisition module 502 may then acquire thethread state for each logical processor of the physical SMT processor.This thread state acquisition may be performed at a sufficiently highfrequency and over a sufficiently long interval so as to generate enoughdata for those having ordinary skill in the art to considerstatistically valid. For example, the thread state acquisitions may beperformed three times per second, twice per second, once every second,once every two seconds, once every three seconds, and so on. Similarly,the acquisition interval may be a quarter of an hour, half an hour, onehour, two hour, three hours, various combinations thereof, and so on.

The data processing module 504 may then be used to process the dataacquired by the thread state acquisition module 502. In oneimplementation, the data processing module 504 may disregard threadsthat are not considered to be legitimate threads, such as overheadthreads resulting from the use of the thread state counter, as well asidle threads. The data processing module 504 may then group the reportedCPU usage according to some predetermined criterion, for example, thenearest 10%. The resulting distribution may represent a fairly accurateestimate of the actual distribution of the number of running threads forvarious levels of reported CPU usage.

Once the data has been processed, an adjustment factor for the front-endserver 102 may then be determined by the curve fitting module 506. Inone embodiment, the curve fitting module 506 may determine theadjustment factor by applying well-known curve fitting techniques, suchas regression analysis, to the data acquired by the thread stateacquisition module 502. An equation may then be derived that may be usedto calculate the compensated estimate of CPU usage for the front-endserver 102. The equation may then be stored in an appropriaterepository, provided as needed to processor usage tuner 320 (tocompensate for any distortion in the CPU usage), and the like. In someembodiments, instead of an equation, the curve fitting module 506 maysimply create a lookup table using the data acquired by the thread stateacquisition module 502 (and processed by the data processing module504). The lookup table may then be stored in an appropriate repositoryand used to look up a compensated estimate for the CPU usage reported.

Thus far, specific embodiments have been disclosed for providingcompensated estimates of CPU usage. Referring now to FIGS. 6-7, generalguidelines are shown in the form of methods that may be used toimplement the various embodiments disclosed above. As can be seen inFIG. 6, a method 600 of determining an adjustment factor for an SMTprocessor begins at block 602, where thread states and CPU usage dataare acquired, for example, by invoking a thread state counter of theoperating system. The data may be acquired at a predetermined frequencyand over a predetermined interval, such as once per second or a one-hourinterval. At block 604, a determination is made as to whether thethreats are legitimate threads; that is, the threads are not overheadthreads or idle threads. If the answer is no, then the threads arediscarded at block 606. If the answer is yes, then the method 600continues at block 608, where the CPU usage estimates are groupedaccording to a predetermined criterion, for example, the nearest 10%.The resulting distribution is believed to represent a fairly accurateestimate of the actual distribution of the number of running threads forvarious levels of reported CPU usage. Curve fitting techniques may thenbe applied at block 610 to the distribution of threads to determine anequation for the distribution. The equation may then be used tocalculate an adjustment factor adjustment factor. Alternatively, insteadof an equation, a lookup table may be created from the threads and usedto look up the adjustment factor.

FIG. 7 illustrates a method 700 of compensating for distortions in thereported CPU usage according to the disclosed embodiments. The method700 begins at block 702, where an adjustment factor, or rather theequation or lookup table that may be used to drive the adjustmentfactor, may be obtained. The equation or lookup table may be a standardone applicable to multiple servers (and the SMT processors therein), orit may be a server-specific equation or lookup table applicable to aparticular server only. At block 704, a CPU usage estimate is acquired,for example, by invoking a special process (e.g., “Typeperf,” “Perfmon,”etc.) of the operating system designed for monitoring the performance ofthe processor. At block 706, an adjustment factor is calculated for theCPU usage estimate, for example, from the equation or the lookup table.The adjustment factor is then applied to the reported CPU usage estimateto obtain a compensated estimate at block 708. At block 710, thecompensated estimate may be stored for subsequent use to detecthistorical trends and usage pattern in support of any capacity planning,cost allocation, or the like.

While the disclosed embodiments have been described with reference toone or more particular implementations, those skilled in the art willrecognize that many changes may be made thereto. Therefore, each of theforegoing embodiments and obvious variations thereof is contemplated asfalling within the spirit and scope of the disclosed embodiments, whichare set forth in the following claims.

What is claimed is:
 1. A computing system, comprising: a simultaneous multi-threading (SMT) processor and a memory for storing computer-readable instructions executable by the SMT processor; at least one module deployed in the memory and executed by the SMT processor to generate an adjustment factor for the SMT processor; at least one module deployed in the memory and executed by the SMT processor to acquire an SMT usage estimate for the SMT processor made by a thread-aware operating system; and at least one module deployed in the memory and executed by the SMT processor to determine a compensated SMT usage value based on: the SMT usage estimate; and the adjustment factor for the SMT processor or a lookup table from a repository of compensated estimate lookup tables.
 2. The computing system of claim 1, wherein the adjustment factor for the SMT processor includes an equation for a representative curve of a number of thread state distribution curves.
 3. The computing system of claim 1, including at least one module deployed in the memory and executed by the SMT processor to obtain the compensated SMT usage value for the SMT processor by applying the adjustment factor for the SMT usage estimate to the SMT usage estimate, wherein the compensated SMT usage value compensates for any distortions in the SMT usage estimate.
 4. The computing system of claim 2, including at least one module deployed in the memory and executed by the SMT processor to store the compensated SMT usage value for the SMT processor in the memory.
 5. The computing system of claim 1, wherein the compensated SMT usage value compensates for any distortions in the SMT usage estimate.
 6. The computing system of claim 1, including at least one module deployed in the memory and executed by the SMT processor to: acquire a sampling of reported SMT usage for the SMT processor; acquire thread states for the SMT processor that correspond to the sampling of reported SMT usage for the SMT processor; and obtain the adjustment factor for the SMT processor using the sampling of reported SMT usage and the thread states for the SMT processor.
 7. The computing system of claim 1, including at least one module deployed in the memory and executed by the SMT processor to acquire the SMT usage estimate from an operating system running on the computing system.
 8. A method, comprising: generating via a simultaneous multi-threading (SMT) processor, an adjustment factor for the SMT processor; acquiring an SMT usage estimate for the SMT processor made by a thread-aware operating system; and determining a compensated SMT usage value based on: the SMT usage estimate; and the adjustment factor for the SMT processor or a lookup table from a repository of compensated estimate lookup tables.
 9. The method of claim 8, wherein the adjustment factor for the SMT processor includes an equation for a representative curve of a number of thread state distribution curves.
 10. The method of claim 8, wherein the method includes obtaining the compensated SMT usage value for the SMT processor by applying the adjustment factor for the SMT usage estimate to the SMT usage estimate, wherein the compensated SMT usage value compensates for any distortions in the SMT usage estimate.
 11. The method of claim 10, wherein the method includes storing the compensated SMT usage value for the SMT processor in a memory of a computing system.
 12. The method of claim 8, wherein the compensated SMT usage value compensates for any distortions in the SMT usage estimate.
 13. The method of claim 8, wherein the method includes: acquiring a sampling of reported SMT usage for the SMT processor; acquiring thread states for the SMT processor that correspond to the sampling of reported SMT usage for the SMT processor; and obtaining the adjustment factor for the SMT processor using the sampling of reported SMT usage and the thread states for the SMT processor.
 14. The method of claim 8, wherein the method includes acquiring the SMT usage estimate from an operating system running on a computing system.
 15. A non-transitory computer-readable medium storing computer-readable instructions executable by a simultaneous multi-threading (SMT) processor to: generate an adjustment factor for the SMT processor; acquire an SMT usage estimate for the SMT processor made by a thread-aware operating system; and determine a compensated SMT usage value based on: the SMT usage estimate; and the adjustment factor for the SMT processor or a lookup table from a repository of compensated estimate lookup tables.
 16. The non-transitory computer-readable medium of claim 15, wherein the adjustment factor for the SMT processor includes an equation for a representative curve of a number of thread state distribution curves.
 17. The non-transitory computer-readable medium of claim 15, further comprising instructions executed to obtain the compensated SMT usage value for the SMT processor by applying the adjustment factor for the SMT usage estimate to the SMT usage estimate, wherein the compensated SMT usage value compensates for any distortions in the SMT usage estimate.
 18. The non-transitory computer-readable medium of claim 15, further comprising computer-readable instructions executed to store the compensated SMT usage value for the SMT processor in a memory of a computing system.
 19. The non-transitory computer-readable medium of claim 15, wherein the adjustment factor for the compensated SMT usage value compensates for any distortions in the SMT usage estimate.
 20. The non-transitory computer-readable medium of claim 15, further comprising computer-readable instructions executed to: acquire a sampling of reported SMT usage for the SMT processor; acquire thread states for the SMT processor that correspond to the sampling of reported SMT usage for the SMT processor; and generate the adjustment factor for the SMT processor using the sampling of reported SMT usage and the thread states for the SMT processor. 